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ABSTRACT 

If damped Lyman alpha systems (DLAs) contain even modest amounts of dust, the uhravi- 
olet luminosity of the background quasar can be severely diminished. When the spectrum is 
redshifted, this leads to a bias in optical surveys for DLAs. Previous estimates of the magni- 
tude of this effect are in some tension; in particular, the distribution of DLAs in the (A^hi, Z) 
(i.e. column-density - metallicity) plane has led to claims that we may be missing a consider- 
able fraction of metal rich, high column density DLAs, whereas radio surveys do not unveil a 
substantial population of otherwise hidden systems. 

Motivated by this tension, we perform a Bayesian parameter estimation analysis of a sim- 
ple dust obscuration model. We include radio and optical observations of DLAs in our overall 
likelihood analysis and show that these do not, in fact, constitute conflicting constraints. 

Our model gives statistical limits on the biasing effects of dust, predicting that only 7% 
of DLAs are missing from optical samples due to dust obscuration; at 2<j confidence, this 
figure takes a maximum value of 17%. This contrasts with recent claims that DLA incidence 
rates are underestimated by 30 — 50%. Optical measures of the mean metallicities of DLAs 
are found to underestimate the true value by just 0.1 dex (or at most 0.4 dex, 2a confidence 
limit), in agreement with the radio survey results of Akerman et al. As an independent test, we 
use our model to make a rough prediction for dust reddening of the background quasar. We 
find a mean reddening in the DLA rest frame oflog^Q{EB-v) — —2.4 ± 0.6, consistent with 
direct analysis of the SDSS quasar population by Vladilo et al., logiQ{EB-v) — —2.2 ± 0.1. 
The quantity most affected by dust biasing is the total cosmic density of metals in DLAs, 

DLA. which is underestimated in optical surveys by a factor of approximately two. 
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1 INTRODUCTION 

Damped Lyman alpha systems (DLAs), neutral gas with column 
densities A''hi > 2 x 10^" cm~^ seen in absorption against more 
distant luminous sources (generally quasars), are of substantial in- 
terest to observational and computational cosmologists. Despite 
disagreement over their precise nature, they are certain to trace a 
set of objects which constrain our theories of galaxy formation. 
This is guaranteed by the simple observational fact that they con- 
tain the overwhelming majority of neutral hydrogen (a necessary 
precursor to molecular hydrogen and therefore star formation) over 
all redshifts z > (Tytler 1987[l. For a review of observational and 
theoretical results seC iWolfe et aLjpOOS^ . 

One area of controversy in the interpretation of DLA obser- 
vations is the extent to which biases are introduced by dust: it is 
possible to imagine scenarios in which certain metal rich, high col- 
tmm density DLAs dim their background quasars such that signif- 
icant fractions are not detected in optical surveys. Early attempts 
at assessing the magnitude of this effect by comparing the spec- 
tral slopes of QSOs with and without intervening DLAs seemed to 
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suggest that estimates of important quantities such as the total den- 
sity of neutral hydrogen (Odla) and the mean metallicity ({Z}) of 
DLAs could be incorrect by orders of magnitude (Fall & Pei 19931 
and references therein). While recent results (e.g. Murphy & Liske] 
|2004||Ellison et al.|2005[|Vladilo et al.|2008) show that the extent 
of dust reddening was substantially overestimated in these early 
works, emphasis on the observational evidence in apparent support 
of the obscuration scenario has shifted to the distribution of ab- 



sorbers in {Nm, Z) space. First noted by Boisse et al. (^1998 ), there 
is a dearth of absorbers exhibiting simultaneously high A^hi and 
high Z - exactly as would be expected in a scenario invoking sig- 
nificant dust absorption. Recent work on such models (e.g. |Vladilo| 
& Peroux 2005 ) has suggested that smaller but still important ef- 
fects arise from dust obscuration. In particular, a dust-induced bias 
has been invoked by simulators to reconcile high DLA metallici- 
ties encountered in models with the generally low values measured 
empirically ( |Cen et al.|2003[|Nagamine et al.|2004) . 

One should be clear, however, that this interpretation is not 
unique - when at the tails of both A''hi and Z distributions, systems 
will anyway be rare (see Figure [T| which shows similar distribu- 
tions arising from the different models described in Section [2T| l. 
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Figure 1. A heuristic picture of different regions of our model space. Our optical sample based on Dessauges-Zavadsky et al. ( in prep. I is shown by dots; 
the Corals radio sample based on |Akerman et al.]^2005|) is shown by crosses or triangles for the upper limits (see Section [Z2] for a discussion of these 
datasets). Contour lines of equal probability density in the log A^ui - log Z plane for finding a DLA in an optical sample (solid lines) or radio sample (dashed 
lines where this differs from the optical case) are shown for three models. In each panel, the contours correspond, from left to right, to 0.9, 0.8, . . . , 0.2, 0.1 
and 0.05 times the peak probability density. In the left panel, high column densities of hydrogen are intrinsically unlikely but there is no dust absorption; 
in the central panel, the Schechter-type intrinsic cut-off is absent, but dust truncates the observed high column densities of metals. The final panel illustrates 
the favoured type of model from our analysis, in which both intrinsic and obscuration mechanisms have some part to play in shaping the optically observed 
distribution. By eye, the three sets of optical contours appear similar; this illustrates the need for a rigorous method to probe the role of dust in shaping the 
final distribution. 



Thus statistical interpretation of these apparent trends must be ap- 
proached with care. 

In fact, a range of constraints cast doubt on models predicting 
a substantial bias. Starting from the observed relative abundances 
of elements which are depleted onto dust by differing amounts 
such as zinc (undepleted) and chromium (severely depleted), |Pet-| 



tini et al. |( |l997t estimated a DLA-induced extinction at 1500 A of 



just ~ 0.1 mag. More directly, samples of radio-selected QSOs 
(which are unaffected by dust) exhibit similar incidence rates of 
DLAs as their optical counterparts ( [Ellison et al.||20()T| see also 
[Jorgenson et al.|2006| although the optical identification in this lat- 
ter work is not complete). Moreover, high resolution spectroscopy 
of the |Ellison et al.| DLAs (known as the CORALS sample) shows 
a similar distribution of metal column densities as found in optical 
samples jAkerman et al.|2 005). While the radio-selected samples 
do show a marginal Icr difference from the optical data in both 
mean metallicity and incidence rate of DLAs (in the correct sense 
for a dust obscuration bias signature), it is not at all clear whether 
this is merely a statistical fluke. It is worth noting that, even if the 
effect on measures such as the incidence rate and mean metallicity 
is minor, weighted measures such as Odla can be more critically 
affected. At most risk of being underestimated is the total mass 
of metals in DLAs, JIz.dla, which is observationally interesting 
when conducting a census of metal enrichment over cosmic time 
( |Pettini|2006[|Bouche et al.|2007| and references therein). 

Overall, the previous work described above appears to be in 
some tension. Difficulty in understanding these tensions is exac- 
erbated by analyses using ad-hoc statistical methods or "by-eye" 
assertions. These problems motivated the present work in which 
we have taken a Bayesian parameter estimation approach to putting 
useful limits on the effects in question. In our analysis we have used 
four logically distinct observational datasets: an optically selected 



sample of DLAs, a radio selected sample of DLAs, SDS^ statis- 
tics for the column densities of DLAs, and overall incidence rates 
for DLAs in radio and optical surveys. 

The Bayesian parameter estimation formalism requires us to 
(i) formulate a parameterized model describing the data and (ii) 
place prior probabilities on the distribution of parameters for the 
model. These two processes can, of course, give rise to contro- 
versy - especially when the physical processes in play are hard 
to model. In particular, stage (i) places a unit prior probability on 
our chosen model: we might humbly admit that this is not entirely 
satisfactory, but emphasize that the Bayesian technique does not in- 
troduce but merely highlights such difficulties. We also performed 
additional analysis on a widened parameter space which goes some 
way to mitigating our concerns (se e AppendixjAjl. For m ore details 
on the Bayesian technique see e.g. |Jaynes & Bretthorst| ( |2003^ . 

The remainder of this paper is structured as follows. In section 
|2.1[ we develop a basic model which we argue captures the signif- 
icant effects of dust-induced obscuration. We describe our use of 
optical- and radio-selected survey results to calculate likelihoods 
for this model in section [2^ With some simple priors described in 
section [23| we examine the resulting statistical estimates for com- 
pleteness of optical samples in section[3] Finally, we conclude that 
dust biasing is a real but minor effect in section|4]in which we also 
discuss how our technique and results differ from similar work by 
[Vladilo & P6rouxl ( |2005t . 
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2 MODEL AND PARAMETER ESTIMATION 

In this section, we will form a simple model for the observed be- 
haviour of absorbers with a continuous parameter which describes 
the extent to which dust obscuration plays a part. Performing pa- 
rameter estimation will then allow us to assess the effect of dust ab- 
sorption on the observed statistics. The final model has five param- 
eters, so we use a Metropolis-Hastings Markov chain Monte Carlo 
algorithm to sample the posterior probability distribution ( |Press| 
|et al.|2007, and references therein). 

2.1 Model 

Our simple model starts from the assumption that the intrinsic dis- 
tribution of DLAs is separable in the (A'^hi, Z) plane. Although 
locally A^Hi and the star formation rate may be expected to be 
correlated (via the Schmidt-Kennicutt relation observed in local 
galaxies, see e.g. |Kennicutt|[T9 98l), our own A'^-body simulations 
of galaxy formation ( Pontzen et al. 2008 1, as well as previous sim- 
ulations ( |Cen et aL|2003(|Nagamine et al.|2004^ , suggest that there 
is no significant correlation between A^hi and the global star for- 
mation history of the host galaxy and hence its metallicity. Thus 



/DLA(iVHI, Z) = fN{Nm) fz{Z) 



(1) 



where /dla(A'^hi, ■^) gives the intrinsic probability density of a 
DLA's location in the (A'^hi, Z) plane, picked with no observational 
biases. The distribution of column densities /]v follows a Schechter 
function (as suggested by jPei & Fall|1995) 



/N(AfHl) = 



-JVHi/JVcut 



(2) 



where a measures the low column density slope and A'^cut is a char- 
acteristic cut-off column density. The distribution of metallicities 
fz is assumed lognormal 



/z(^) = ^exp 



2a| 



(3) 



where [Z] = logj^Q Z/Zq, is the mean log metallicity and az is 
the standard deviation of the log metallicity. We have intentionally 
not normalized our distribution functions at this stage. 

These functional forms are based jointly on observational and 
simulated work - but of course the observations are from mag- 
nitude limited optical samples, so it is worth asking whether the 
intrinsic distributions could in fact have a substantially different 
shape; we have addressed this possibility in Appendix A (but find 
that our current parameterization is adequate given some fairly 
weak assumptions). 

2.1.1 Determination of dust column density 

We will assume that the optical depth of dust in any system may be 
modelled as 



rdust(A, A^Hi, Z) = tq{\)Nvl 



ZTf.{Z) 

ZoJ^Fc{Zo) 



(4) 



where Zq is a normalization metallicity, jFpc represents the vary- 
ing fraction of iron in the dust phase as a function of metallicity, 
and To (A) specifies a linear scaling between dust column density 
and optical depth at wavelength A. This form is based on the fair 
assumption that the DLA gas density is dominated by the neutral 
hydrogen density (see |Wolfe et al.|2005^ . Our results are actually 
rather insensitive to the exact functional form of J-pc, so long as 
the fraction in dust increases gently with metallicity, but for ease 
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Figure 2. The adopted relation (from lvTadilo & Peroux|2005f between the 
metallicity and fraction of iron in dust is based on the observed relation 
between metallicity and iron-to-zinc ratio (since zinc does not deplete onto 
dust grains). Here, data from the optical sample described in the Section 
|2.2| are plotted (dots and triangles for limits) along with the best fit model 
(curve). 



of comparison we have adopted the form suggested by [Vladilo &| 
|Peroux. ( ,2005) : 

if[Z]-lZ]o 



J^Fc = ^ + - tan 

Z TT 



(5) 



Since zinc does not deplete onto dust, the ratio of iron to zinc col- 
umn densities is predicted by jTpc : 



[Fe/Zn] = logio (1 - J^pc) 



(6) 



This relation can be used, along with observational constraints on 
iron and zinc abundances in ind ividual systems from the dataset 
described below in Section 2.2 to estimate best fit parameter^ 
[Z]o = —1.3 and A[Z] — 0.48 in equation jsj. For reference, 
we have plotted this relationship in Figure [2] 

2.1.2 Conversion from optical depth to detection probability 



We base our predictions for optical samples on the behaviour of 
a survey for quasars in the SDSS i band with a mean wav elength 
\i ~ 7480A. For our optical data (defined in section |2.2| below), 
the mean redshift (^dla) = 3.0 translates into a DLA rest-frame 
wavelength of Ao = Ai/(1 -I- (zdla)) — 1900A. This will be use- 
ful in fixing a prior on tq later. Over the full range of our sample 
(1.8 < z < 3.5) the DLA rest-frame wavelength varies between 
1600 < Ao/A < 2700. According to the low-metallicity extinc- 
tion law measured in the SMC (Small Magellanic Cloud; [Pei|1992[ l, 

^ We could have included this estimation in our full Bayesian formal- 
ism, but due to the insensitivity of our results to the details of the relation 
Tyc{^)^ such an approach would add complexity without substantial ben- 
efit. 
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Figure 3. The completeness function for SDSS QSOs in the i band (solid 
line) compared to our simple analytic approach (dotted line). Am is the 
change in the apparent i-magnitude, while the vertical axis shows the frac- 
tion of QSOs that would be missed under such a reduction in brightness 
assuming a step-function sensitivity. The differences between the exact and 
simple analytic model are only significant when the probability of detection 
falls below 10"'^, and thus have little impact on our overall statistics. 

there is a factor two variation in the expected strength of dust ab- 
sorption over the interval 1600 < Ao/A < 2700. However, we 
have chosen not to implement the resulting redshift dependences 
since: 

(i) the optical metallicity data (section \2.2\ are not compiled 
from a single magnitude-limited sample, but rather from multiple 
datasets so that an exact modelling is for all practical purposes im- 
possible; 

(ii) computationally, it would be extremely expensive to allow 
redshift variation (since the normalization of the models would 
need to be recalculated for every absorber, instead of once per 
model - see equations |l l[|12| below); 

(iii) the stochastic variation in other parameters (metallicity and 
column density) dwarfs the maximum variation of a factor two; and 

(iv) since the metallicity and column density evolution is known 
to be weak ^Wolfe et al.|2005") , systematic biases are unlikely to 
arise from neglecting slight redshift dependences. 

For similar reasons, we do not allow ourselves to become fix- 
ated on an exact modelling of the observed quasar background 
luminosity function. We assume the background population of 
quasars has an observed distribution in the SDSS i band which 
obeys log^^ AN /Am = c+/3m with /3 ~ 0.7 ^Richards et al.|2006f 
and that the detection probability is one above a given brightness 
threshold (m < mo) and zero below (m > mo). Then the total 
number of quasars which can be observed in the absence of dust 
obscuration is 

10'=+'^'" dm. (7) 

- oo 

When dust is introduced, the luminosity of a given system is 
reduced by a factor e~^; the apparent magnitude changes by 



Am(r) = 2.5r/lnl0. The number of quasars which could be 
observed in the presence of such absorption is 

/mo 
^QC+/3[,n+Am(.)]^^ (8) 
-oo 

so that the probability of detecting a system which has optical depth 
r in the i band is reduced by the factor 

Pdctcct(r) = = exp(-2.5/?r). (9) 

.'^ tot.unobsc 

While emphasis has previously been placed on departures of 
AN/ dm from power law behaviour and hence more complex forms 
of pdctcct (Ellison et al. 2004), we find that the actual effects are 
well modelled by our approach (see Figure [3] which compares the 
analytic and exact SDSS models). Other than simplicity for its own 
sake, there is a tangible benefit to keeping the model basic: it can 
be integrated partially analytically (see Section [2. 1 .4| below). 

2.1.3 Completing the model 

The optically observed joint probability distribution 
n-DLA(A'^Hi, ^) is simply the product of the intrinsic /dla 
with the detection probability Pdctoct, 

riDLA (iVni , ^) = /dla ( A^hi , ^) 

xexp|-2.5/3roiVHi#|^| (10) 

which completes our model. The five free parameters that we have 
introduced are summarised in Table[T] 

In Figure [T| we have illustrated some distributions which can 
be achieved by the model above. The dotted and solid contours 
trace respectively lines of constant /dla (the intrinsic distribu- 
tion, equation [TJ and hdla (which includes the effect of dust ob- 
scuration, equation [T0| - i.e. the former traces the distribution of 
radio-selected DLAs and the latter that of optically-selected DLAs. 
We have also plotted our optical sample (dots) and radio sample 
(crosses or triangles for upper limits) on each panel - see Sec- 
tion [Z2] below for details of these datasets. The left panel shows 
a model where ro = (so no dust obscuration effects are in play). 
One should note that, despite this, the contours show very small 
probabilities for high A'^hi, high Z absorbers simply because the 
underlying separable distribution fnfz predicts few absorbers in 
this region. The second panel shows a model similar to that used 
by Vladilo & Peroux| ( |2005 ), where no intrinsic cut-off occurs at 
high A^Hi, but dust obscuration hides the higher column densities. 
The third panel shows a combination of these two effects forming 
the final distribution; our results (Section |3] Figure |4j will show 
that such a combination is necessary to best describe the data. 

2.1.4 Normalization 

In assessing our likelihood, we will split the data into two logically 
distinct constraints: the total density of systems and the distribu- 
tion of systems within the (A^hi, Z) plane. For these purposes, we 
require the normalizing constants 

no = J ANmAZnoLA{Nm,Z) (11) 

/o = J ANmAZfDi^A{NHi,Z). (12) 

Because of the separability of the intrinsic absorption /, its normal- 
izing constant /o may be calculated straight-forwardly to be 
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Table 1. Summary of parameters and priors thereon for the DLA observation model. 
Parameter Equation Description 



Prior 



a 

Ncnt 
TO 



Low A^Hi slope 

Intrinsic A'^jjl roUoff scale 

Parameters for lognormal metallicity distribution 
Dust optical depth normalization for SMC (Zq = 



Flat -2.5 < Q < 
Flat in log space; A^cu 



< W 



Flat priors — 
M = -21.7, 



< Hz 0.1 <<Tz < 3.0 
' = 1 



/o = V27raz In 10 nI+" r(l + q, TVo/iVcut) (13) 

where TVo — 2 x 10^" cm~^ is the DLA limiting column density 
and r is the incomplete gamma function, 

/"OO 

V{a,x) = I dtt^-^e^' (14) 

for the evaluation of which we employ a standard numerical algo- 
rithm jPress et al.|2007| l. For the obscured case we integrate ana- 
lytically over A^Hi, but the metallicity integral must be performed 
numerically: 

no=/ dZfz{Z)N^+"r{l + a,No/N,B{Z)) (15) 
Jo 

where N,ft{Z)-^ = iV-J + 2.5/3ro (16) 

J-Fc[Z:0)/iO 

and fz (Z) is defined by equation (jsj. 
2.2 Data and Likelihood 

Each model is assessed on four points corresponding to proper- 
ties of optically selected absorbers, properties of radio selected ab- 
sorbers, a comparison of the line densities of absorbers in these two 
types of survey and finally SDSS constraints on the column density 
distribution. The overall likelihood £ is simply the product of the 
four factors: 

£. — Copt Cmd -Clincdons -CsDSS (17) 

with the terms formally defined below in equations (|18|-|22[(. 

We use data from high-resolution optical measurements of 123 
DLAs based on the compilation by Dessauges-Zavadsky et aTjjml 
|prep.| l restricted to the redshift range 1.8 < z < 3.5 to match the 
approximate range of the CORALS radio sample (see below |^ For 
each DLA we use as a measure of its metallicity the zinc abun- 
dance relative to the preferred solar value 12 + log\o (^ii/H)q = 
4.63 (from Lodders 2003); where zinc measurements are unavail- 
able we use the iron abundance normalized similarly by 12 + 
logj^g (Fe/H)^ = 7.47 . Zinc is not prone to deplete onto dust 
grains but at lower column densities its transitions become too 
weak for measurement; conversely, iron is typically disfavoured 
as a metallicity indicator since it is refractory but the depletion is 
small at low metallicities (see Figure |2] in which the depletion of 
iron relative to zinc is plotted). Since our sample is dominated by 
zinc measurements down to Z ~ Zq /30, the iron depletion should 
not be a major concern. In any case, any systematic underestimates 
of metallicities which may arise would apply equally to radio- and 
optically-selected DLAs and should therefore not result in any sub- 
stantive systematic biases for our test. 

We also checked that we obtained compatible, although slightly less 
well constrained, final results from the smaller optical sample described 
by |Prochaska et al.|j2007} . 



Observers typically favour targeting high A'^ei systems for 
high resolution follow-up. For this reason, we do not allow the dis- 
tribution of A'^Hi values in the optical metallicity sample to affect 
our statistics, instead restricting ourselves to measuring the likeli- 
hood of each metallicity observation with the column density of the 
responsible absorber as a given, i.e. 

£...^np..u^.!;vo^n ^^^;(g;;;g^^ os) 

where i ranges over the optical sample and the final relation fol- 
lows from the conditional probability rule: p{Zi\Ni) p{Ni) — 
p(Z,&iN,). 

Metallicity data from the radio-selected CORALS survey are 
taken from |Akerman et "aTj ( |2005| l. As for the optical sample de- 
scribed above, we use Zn or (where necessary) Fe to define the 
metallicity. Unlike the optical case, each radio observation is as- 
sessed jointly on its column density and metallicity since no col- 
umn density biases are expected. For two DLAs no abundances 
have been measured and for a further two only an upper limit on the 
metallicity is available. By noting that this situation corresponds to 
an "infinite upper limit" on the metallicity of the former two DLAs, 
we can include all systems consistently in the likelihood: 

= n X n T /""°" /(^- (19) 

- Jo . Jo Jo 

where i ranges over the radio sample with measured metallicities, 
and j ranges over those four with only upper limits. 

Separately from the (A'^hi, Z) distribution of observed DLAs, 
we should also consider the overall incidence rate in optical (e.g. 
SDSS) and radio surveys. The incidence of DLAs in the SDSS 
has been discussed extensively for the third data release (DR3) by 
[Prochaska etal.| ( |2005^ ; here we will make use of the updated statis- 
ticsforDRf] 

Because the SDSS pathlength is very much larger than that of 
any other survey, we make the simplifying assumption that there 
is no error on its determination of the obscured rate of DLA in- 
cidence, iobsc ~ 0.063 over 2.2 < z < 3.5. This quantity is a 
measurement of the number density of DLAs per unit "absorption 
distance" X, defined by dX/dz = Ho{l + zf /H{z). The line 
density of radio-selected quasar DLAs is increased by the ratio of 
all DLAs to unobscured, i.e. iunobsc = 'obsc/o/wo (see equations 
|1 1[|12^ . This follows because the line density is at first order pro- 
portional to the normalizing constants /o and no for the unobscured 
and obscured cases respectively (see also discussion around equa- 
tion[23l). 

We make use of the CORALS ( [Ellison et al.|200l] > results giv- 
ing a radio sample pathlength of AX = 195 (assuming Q,m =0.3, 
Q.A = 0.7 whence dX/dz ~ 3.5, any errors in which are 
small compared to sample variance). Although [Jorgenson et al.| 
(j2006j present additional radio-selected DLA statistics, unlike the 

www. ucolick . org / 'xavier/SDSSDLA/DRS/ 
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Corals results their optical identifications are incomplete and so 
to be conservative we did not take advantage of the expanded sam- 
ple. The overall expected number of DLAs in the CORALS sample 
is A where 



cAX — ZobscAX— — 12.3 — 
no no 



(20) 



For each model /o / no and hence A is determined; given the fixed 
number of DLAs actually seen in the sample, fc = 17, the corre- 
sponding likelihood is given by the Poisson distribution 



fc! 



(21) 



We note that the mean redshift {2:) of all DLAs in the two samples 
(radio and ^-limited SDSS) is respectively 2.5 and 2.9. Given the 
very slow evolution of DLA incidence rate at high redshift, this 
difference is unimportant. 

Finally, the SDSS data produce a joint constraint on the 
strength of dust absorption tq and the Schechter function cut-off 
A^cut through the distribution of column densities. Because our op- 
tical data likelihood £opt does not take account of the distribution 
of column densities, the SDSS survey may be regarded as an en- 
tirely independent constraint with likelihood 



n{Ni,Z)dZ 



(22) 



where i ranges over the 587 systems in the previously described 
subset of the SDSS DR5 data. 

For readers unused to the Bayesian approach to statistics, it 
may be a surprise that our likelihoods are a product of probability 
densities and therefore will vary under reparameterizations of the 
data. However the final analysis considers only ratios of probabili- 
ties for different models, for which the Jacobian factors cancel. 



2.3 Priors 

As discussed in the introduction, we are required to place prior 
probability distribution functions on our parameters to summarise 
known physics and observational constraints not included in the 
likelihood. We have little information on DLA dust absorption 
which is not used in our likelihood analysis, so most of our pri- 
ors are deliberately as neutral as possible, while limiting the values 
to reasonable physical expectations. 

Note that allowing the column density distribution function 
cut-off to tend to infinity (A'^cut —> 00) allows for significant num- 
bers of implausibly dense environments. A conservative prior from 
observations of extreme astrophysical situations (in particular, ac- 
tive galactic nuclei and gamma ray bursts) is that column densities 



, which is implemented by adopt- 
10^^ cm~'^. In practice, the likeli- 



10^ 



cm so that our 



do not exceed A^h ~ 10 cm" 
ing a flat log prior for A^cut < 
hood is sharply peaked around Acut 
results are insensitive to this choice. 

The parameter controlling the strength of the dust extinction 
effect . To, can be estimated. We use the SMC extinction curve ( |Pei| 
I992I at our mean rest-frame wavelength Ao — 1900A (see sec- 



tion 



2.1 



above), gaining t(Ao) 



10" 



1/ cm 



If we 



were confident of this estimate, we could fix tq — 10^^^'^ cm^ 
and Zo — Zq/6 (the SMC metallicity) in equation lj4|. However, 
when searching for direct evidence of dust obscuration, this would 
appear circular: to should be allowed to vary. Zq does not need to 
vary, even if we are unsure of the exact SMC metallicity, since a 
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logio To/cm^ 

Figure 4. Given the priors on A^cut and to listed in Table[T] the 1 and 2a 
contours for the marginalized Bayesian model fitting problem are shown 
(solid contours, top panel). These show that our model best fits the observed 
data when both an intrinsic drop in the number of high column density 
DLAs and a dust bias are in effect. To illustrate the effect of our prior on 
TO, the dotted contours are lines of constant probability density for a flat log 
prior on to . The lower panel shows the distribution marginalized over A'cut 
to give posteriors on tq . 



misestimate can simply be absorbed into the posterior value of to 
without affecting the observable predictions of the model. 

With our caveat of circularity in mind, it is tempting to try and 
place some form of uniform prior on In to ~ but this is impossible, 
since as the effect tends to zero (In to — > —00), the models become 
indistinguishable in their predictions for a finite data-set and the 
likelihood density becomes constant. One must therefore be care- 
ful to assign a prior with finite integral as In to — > —00 but which is 
not so sharp as to exclude the possibility of an unexpected resulj^ 
This will anyway reflect substantial uncertainties in our estimate 
of To (and Zq). Thus we assign a generous order of magnitude un- 
certainty at the Icr level, making the prior on logjo to / cm^ normal 
with mean /i = —21.7 and variance cr = 1.0 dex. The effect of this 
prior on the results is discussed in more detail in Section[3]below. 

We have assigned flat priors to the remaining parameters 
which control the intrinsic metallicity model and the weak end of 
the column density distribution (see Table|2]l. These are well con- 
strained by the data; consequently the priors do not impact strongly 
on our final results. 



3 RESULTS 

The result of our analysis is shown in the (To,Acut) plane 
(marginalized over all other parameters) by the solid contours in 
the upper panel of Figure|4] These contain 68% and 95% of the to- 
tal probability, corresponding to 1 and 2a limits respectively. We 



Such an unexpected result would likely point to a deficiency in the model. 
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Obscuration-induced shift (dex) 
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 



V 




0.01 



0.4 

F (fraction observed) 

Figure 5. The posterior probability for an optical fractional completeness 
of at most F is plotted against F for various quantities described in the 
text (see equations |23| - [27} . 1, 2 and 3o" completeness lower limits are 
given by the intersection of these curves with the grey horizontal lines from 
top to bottom respectively. Our results show that we are unlikely to miss a 
substantial number of DLA systems (^dla is almost complete); however 
metallicity-weighted measures can be more substantially underestimated 
(although not by orders of magnitude as previously claimed). See also Ta- 
bleU 



have shown, in the lower panel, the results of additionally inte- 
grating over A''cut to gain a completely marginalized distribution 
for tq. This is plotted as — \/21n7ro/7r where tt is the posterior 
probability density in logj^Q tq and ttq = max(7r) is an arbitrary 
normahzation scale. (For a normal distribution, the plotted values 
— 1, —2, . . . would thus correspond to la, 2a, . . . limits.) The peak 
in this quantity shows that the posterior distribution strongly sug- 
gests dust absorption with the favoured value of logjf, tq ~ —21.8; 
this is very close to the value estimated earlier for the SMC normal- 
ization showing that the model produces results in close accordance 
with expectations. 

However, having used our estimate to place the prior on tq, 
it is legitimate to be concerned that our results simply reflect this 
prior and hence that the data are not actually constraining the prob- 
lem. To demonstrate that this is not the case we have also plotted 
results from assuming a constant log prior on tq (dotted lines in 
Figure |4j. The main peak remains - i.e. it is driven by the like- 
lihood - showing that our prior, as expected, simply cuts off the 
otherwise infinite distribution as ro ^ (the dotted posterior can 
be seen to attain a constant value in the bottom panel, and the 2a 
contours do not close in the top panel). This confirms the satisfying 
result that dust obscuration of the strength implied by SMC obser- 
vations is favoured independently when analysed with our model. 
We emphasize, however, that the flat prior (dotted contour) results 
cannot be used in our final assessment for reasons described in Sec- 
tion |2.3| (The dotted contours do not contain a finite probability, but 
are chosen to correspond to the same probability densities as their 
solid counterparts.) 



3.1 Limits on Optical Completeness 

The most important consequence of a dust obscuration scenario is 
that various cosmological measurements may be biased. In the fol- 
lowing, we will express quantities as functionally dependent on the 
unnormalized distribution function <jf>, where = /dla for a radio 
selected survey oi (j) — tidla for an optically selected survey. We 
are particularly interested in the overall incidence rate of DLAs, 



/dlaM oc / (P{Nm, Z) dNm dZ = 00 



(23) 



noting that^o = /o for <j) ~ /dla and 00 = no for </> — jidla; see 
equations 1 11 1 and 1 12|^ Also of interest is the total mass density 



of neutral hydrogen in DLAs, 
OdlaM oc J iVHi0(iVHi, Z) dNm dZ; 
the mean metallicity of DLAs, 
{ZM oc J Z4>{Nm, Z) dNm dZ/t^o; 
the total mass density of metals in DLAs, 
Oz.dlaM oc J ZNm(l>{Nm,Z)dNmdZ 
and the column-density weighted mean metallicity of DLAs, 

f2z,DLA[0] 



{^)jVhi M 



(24) 



(25) 



(26) 



(27) 



Note that the missing constants of proportionality in equations l |23| 
- \26\ do not depend on (j). For more on the physical significance of 
these definitions, see |Wolfe et al.|pb05[ l. 

The fractional completeness F of any of these measurements 
M as measured by an optical survey is defined as 



F{M) = 



M[nDLA] 

A/ [/dla] 



(28) 



where udla and /dla are the optical and intrinsic distributions, 
defined by equations 1 10 1 and l[TJ respectively. F{M) depends on 
our parameters r = {a, A'cut, tq, Zo, az}\ the probability distri- 
bution for Q = F{M) is written 



d'Vp(r) e{Qo - Q(n) 



(29) 



where 6 is the Heaviside step function, p{r) is the posterior prob- 
ability density, Q is any quantity dependent on the parameters r 
and Qo is a value for which the cumulative probability p{< Qo) is 
being calculated; when evaluating from the MCMC chain, i ranges 
over the models in the chain and TV is the number of steps. We will 
also be interested in the expected value of Q, 



N 

E(Q) = / dV-p(r)Q(r) ^ ^ Q,/iV. 

i=l 



(30) 



The results for the five quantities defined in equations l|23|-[27|) 
are shown in Figure |5] and Table |2] For each quantity M, the plot 

^ Equation j23| assumes that DLAs are lost from optical surveys in direct 
proportion to 1 — Pdctccti ignoring the second-order effect from the reduc- 
tion in the number of observed quasars. A full calculation shows that errors 
introduced by neglecting this term are at the percent level. 
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Table 2. Expected value of and 1 — 3(t confidence intervals for various quantities. The first five rows specify the fractional completeness of optical estimates 
for the specified quantities. The final two rows refer to the expected log mean observable dust absorption specified as the reddening Eb_v of the background 
quasar in the rest-frame of the DLA (see also Figure[6j. 



Quantity (Q) 






Confidence Intervals 








67% 


95% 


99.7% 




0.93 


(0.90,1.00) 


(0.81,1.00) 


(0.72,1.00) 


i^{f^DLA) 


0.87 


(0.81,0.97) 


(0.70,1.00) 


(0.58,1.00) 




0.75 


(0.67,1.00) 


(0.44,1.00) 


(0.26,1.00) 


i^((2>iVH)<*' 


0.63 


(0.43,0.82) 


(0.32,1.00) 


(0.18,1.00) 


F(nz,DLA) 


0.56 


(0.30,0.75) 


(0.22,0.96) 


(0.11,1.00) 


logio Eb-v (optical) 


-2.4 


(-2.8,-1.8) 


(-3.2,-1.8) 


(-4.3,-1.6) 


loglQ Eb-v (radio) 


-2.1 


(-2.5,-1.5) 


(-3.5,-1.1) 


(-4.5,-0.8) 



(^'The overall completeness of the optical sample, i.e. the ratio of the line density estimated from optical samples to the intrinsic line density. 
(^^The fractional completeness of optical estimates of the total comoving density of Hi in DLAs. 
(''^The ratio of the mean metallicity measured in optical samples to the intrinsic value. 

(■^^The ratio of the mean column density weighted metallicity measured in optical samples to the intrinsic value. 
(^^The fractional completeness of optical estimates of the total comoving density of metals in DLAs. 



shows the cumulative probability ^(completeness < F) and the 
table specifies the expected value E{F) along with confidence in- 
tervals at 1, 2, 3a (i.e. {Fq, Fi) such that p{Fo < completeness < 
Fi) = 67%, 95% and 99.7%). The immediate conclusion is that 
optical samples are likely to be biased but at a level significantly 
smaller than many previous studies have claimed. Simple quanti- 
ties such as the overall incidence rates of DLAs (Zdla) are likely 
to be almost unaffected (with only a 7% expected underestimate 
and < 10%, 19% at 1, 2 cr confidence). On the other hand, quanti- 
ties which are weighted towards higher column densities of metals 
suffer more from the effects of obscuration. The total DLA mass 
in Hi (Odla) is unlikely to have been underestimated by more 
than 30% (2a limit), but a heavily weighted quantity such as the 
total mass of metals in DLAs (Q,z,dla) is underestimated by about 
a factor of two, or at most 78% (2cr). Note that this nonetheless 
results in a relatively modest worst-case shift in the mean metallic- 
ity of < 0.4 dex (2a limit) or < 0.5 dex for the column density 
weighted metallicity (2a limit). 



In some cases, we can compare the completeness limits de- 
rived above with analogous estimates by other authors. |Trenti| 
|& Stiaveili| ( |2006) compared the column density distributions of 
SDSS DR3 and CORALS radio samples, concluding that optical 
determinations of JIdla underestimated the true value by around 
15%, very close to our own expected value of 13%. This is perhaps 
unsurprising since the information used in this earlier work is a sub- 
set of our own dataset. Estimates which take into account the op- 
tically determined metallicity distribution (but not the comparison 
with radio-selected quasars) are to be found in | Vladilo & Peroux] 
( |2005^ . The authors give values for the completeness of lohA of 
50 — 70% and claim JIdla is underestimated by at least 50% (their 
section 6.5). These estimates are inconsistent with our 3a limits 
for the minimum completeness, and differ substantially from our 
expected value of E [i^(riDLA)] = 87%. Further, although[Vladilo 



|& Perouxl ( |2005| l did not give completeness statistics comparable to 
ours for their metallicity distributions, they suggest metallicities are 
underestimated by factors of 5 to 6, a shift of about 0.8 dex, again 
incompatible with our 3a limits. We explore possible explanations 
for these differences in Section|4] 



3.2 Expected Dust Reddening 

Because the optical depth of dust rises rapidly towards shorter 
wavelengths, observed quasars obscured by dust are expected to 
exhibit statistically redder spectra than their unobscured counter- 
parts. This effect is discussed in the introduction, but we did not 
use the results of recent dust reddening studies ( [Vladilo et al.|2008} 
[Murphy & Liske|2004| l as priors in our model, since the uncertain- 
ties of these authors' analyses are quite different in nature from 
the uncertainties in our model. However, we should check that our 
results are indeed compatible with the observed reddening effect. 

We caution that our estimate will assume a proportionality 
between the colour shift Eb-v in the DLA rest-frame and the 
strength of the overall obscuration, calculated according to the 
SMC extinction law measured at 1900 A. This assumption is not 
fully justified given the differing DLA redshifts over the sample 
although, as before, we expect that performing the calculation as- 
suming mean values in this way should not introduce a substantial 
bias. 

The expected reddening effect of a DLA on the background 
quasar is 



{EB^vm 



Eb-v \ 

-(i9ooA)yc 



To 



diVHid^ ^"'^°(f)y(^'^) (31) 
y-Pc(Zo)Zo(po 



where the first factor is evaluated from the SMC reddening curve 
giving i?s_v'/''"(1900A) ~ 0.12, the normalizing constant 4>q is 
defined as usual (equation [23 [> and 4> — ,/dla for a radio survey or 
= ?^DLA for an optical survey. 

The posterior distribution is evaluated according to equa- 
tion 1 29 1 setting Q = {Eb-v}', the results are shown in Figure 



[6] with confidence intervals listed in Table |2] It is satisfying that 
our la interval —2.6 < logj^g{EB~v) < —2.0 for an optical sur- 
vey agrees with the result logj^Q{EB-v) — —2.2 ± 0.1 of Vladilo 



et al. 1 2008[ l (and is consistent with the upper limit of [Murphy & 
Liske|2 004l. The expected reddening effect of DLAs in radio sam 



pies (dashed line in Figure [6j is more pronounced than that in op- 
tical samples (solid curve), since the average radio-selected DLA 
will have a higher column density of metals (no dust bias). This is. 
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however, compatible with the limit on radio-selected DLA redden- 
ing (Eb^v) < 0.04 frompUson et al.|((2005 i. 



3.3 Internal Consistency and Driving Factors 

Given that previous studies of the (A'^hi, Z) evidence for dust ob- 
scuration have generally pointed to more pronounced effects than 
indicated by radio-selected surveys (see Introduction), we should 
check that this tension is not present in our analysis; if so this could 
point to a deficiency in the model, limiting the usefulness of the 
results. A severe tension, with one dataset requiring different pa- 
rameters from others, would result in the posterior parameters be- 
ing pushed to intermediate values incompatible with estimates from 
individual likelihood terms in equation ([TtJ. 

The optical completeness in our final analysis is expected to 
be ~ 90% - 100% (Table|2](, giving an expected number of radio- 
selected DLAs 11.8 <i A <; 13.1. Comparing to the actual number, 
k = 17, shows that the radio observations actually detect a slightly 
larger number than our model has predicted - in other words, they 
prefer stronger dust absorption. However, because of the small path 
length of existing radio surveys, the Poisson likelihood l |21^ has a 
wide variance a — \/\ ~ 3.5. The consequence of this is that 
the overall la region is almost entirely contained within the la 
region for the line density data. This shows there is not a substan- 
tive tension between these datasets in our analysis. Because of this 
very consistency (coupled with the wide variance) excluding the 
line density likelihood from the final analysis makes only minor 
differences to the results (a fact we explicitly verified); however we 
have retained it for completeness. 

It is worth briefly investigating which data are most powerful 
in producing constraints on dust effects, especially bearing in mind 
our comments in the Introduction that the apparent anticorrelation 
of A^Hi and Z in optically selected DLA samples can be reproduced 
without any dust effects whatsoever (see Figure[TJ. Concretely, our 
optical sample of zinc and iron metallicities (Section \2.2\ is cal- 
culated to have a Spearman rank correlation statistic r — 0.055, 
giving a two-tailed p-value of 0.55 (i.e. a sample of the same size 
with random uncorrelated values will show the same or greater lev- 
els of apparent correlation in more than 55% of cases). This leads 
to the expectation that, on its own, the optical data can place only 
an upper limit on the effect of dust obscuration. We explicitly ver- 
ified that this is the case by running our analysis without any radio 
constraints. 

In fact, the major factor in determining our results is the com- 
parison of radio-observed and optically-observed distributions of 
column densities and metallicities. These lead to the positive detec- 
tion of dust obscuration effects even with neutral priors that allow 
for no dust in DLAs whatsoever (dotted blue lines in Figure |4j see 
Section[3]l. Because this detection so closely matches estimates cal- 
culated from observations of the interstellar medium in the SMC, 
we may have some confidence that our final results are meaningful. 



4 DISCUSSION AND CONCLUSIONS 

In this work, we have analysed radio- and optically-selected DLA 
samples to produce an overall picture of dust obscuration. We 
first noted that the distribution of optically selected DLAs in the 
(A'hi, Z) plane does not point unambiguously to significant dust 
obscuration. In fact, it is quite possible to form reasonable models 
in which high metallicity, high H I column density DLAs are rarely 
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Figure 6. Dust reddening in our models. The solid curve gives the proba- 
bility density, according to our posterior distribution, of various mean red- 
denings expected in an optical sample (comparing quasar spectra to their 
appearance in the absence of a DLA). The dashed curve gives the same 
statistics for a radio sample (no dust bias). The vertical dotted line gives the 
measured reddening in SDSS DR5 (Vladilo et al. 20081, while the dash- 
dotted line gives the direct upper limit on radio-selected quasar reddening 
derived by| Ellison et al.| {2005 1. We emphasize that our model fitting is per- 
formed without using any such constraints, so that the agreement of both 
statistics is an independent validation of our results. 



seen simply because the product of the metallicity and column den- 
sity distributions is small in this region (Figure[T| left panel). 

We assembled a simple model of DLA dust obscuration in 
which the intrinsic DLA distribution is separable in the (A'^hi, Z) 
plane and a tuneable dust parameter tq obscures a variable frac- 
tion of DLAs from optical samples based on the total column den- 
sity in dust (modelled as NmZJ-'Fc{Z)). We then assessed this 
model using a Bayesian parameter estimation approach with a 
likelihood based on four sets of observational data: an optically- 
selected sample of column densities and metallicities (based on 
|Dessauges-Zavadsky e t al."in prep.); an equivalent radio-selected 
survey (Corals; [Ellison et al. 200l] [Akerman et al.l2005^; the 
SDSS statistics for observed column densities of DLAs ( |Prochaska| 
|et al.|2005[ l; and a comparison of the incidence rates of DLAs to- 
wards optical- and radio-selected quasars ( [Ellison et al.|200l| ). 

Table [2| summarises the observational predictions of our 
model. The results do not allow a large hidden fraction of DLAs; 
thus simple quantities such as their line density (Zdla) are not sig- 
nificantly underestimated in existing surveys. Quantities weighted 
towards higher metal column densities are (as expected) less well 
constrained by optical surveys. However, there is still relatively lit- 
tle room for manoeuvre; in particular, our statistics for the metallic- 
ity suggest that substantial (i.e. <; 1 dex) dust-induced corrections 
of the type sometimes invoked to reconcile models with data (e.g. 
[Cen et ar][2003[ [Nagamine et al.[[2004^ are not supported by the 
data. 

Our model is similar to that of | Vladilo & Peroiix[ (2005; hence- 
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forth VP05) but we arrive at some qualitatively different conclu- 
sions. This may be due to the expanded sample now available, but 
it is worth noting that our analysis also differs in many details: 

(i) We have used a Bayesian approach, being careful to avoid 
focussing on our peak likelihood model but rather analysing the 
entire posterior distribution. This leads to well-defined statistical 
limits on the effects under consideration. 

(ii) We have used a substantially larger optical sample of DLAs 
and additionally considered radio-selected and SDSS observations. 
We found that on their own optical samples are rather poor at con- 
straining the effects of dust (Section |3.3^ whereas adding radio 
samples returns results which are promisingly consistent with es- 
timates from sightlines through the SMC. 

(iii) We formulate the likelihood for each observation rather 
than simply fitting the distribution using a minimization tech- 
nique. For instance, one may not assume that the statistics of high 
resolution optical samples trace the underlying TVhi distribution 
since observers choose their targets using a variety of criteria. Such 
an assumption leads VP05 to estimate a shallower distribution of 
A^'hi values than is revealed by the SDSS (powerlaw indices —1.5 
and —1.8 respectively), for example. This could plausibly bias the 
estimation of dust effects. 

(iv) We have used a lognormal distribution for metallicity which 
we argue eliminates some further biases (equation|3]and discussion 
thereafter, or for more details see Appendix|A|. Also, we have used 
iron abundances where zinc are unavailable, since this traces the 
low metallicity end of the distribution. While iron is refractory and 
therefore can underestimate true metallicities, this is a small effect 
at low metallicities where zinc becomes systematically hard to de- 
tect (Figure |2j. We believe this situation is preferable to ignoring 
the low metallicity tail of the distribution. 

(v) For the column density distribution, we allowed for an expo- 
nential cut-off at A^Hi <; A'^cut (equation|2|. If this parameter were 
unnecessary, our posteriors would have automatically pushed A'cut 
to high values - but this was not the case (Figure |4]l. (The likeli- 
hood is also sufficiently peaked that only an extreme prior would 
reverse this trend.) Removing the intrinsic exponential cut-off in 
A^Hi would have at least two problematic effects. Firstly, it forces 
a substantial increase in the deduced effects of dust obscuration, 
since these alone must account for the drop in observed high TVhi 
absorbers (as illustrated by the central panel of Figure[T](. Secondly, 
it causes estimates for completeness of weighted quantities such as 
f^DLA to converge extremely slowly, with a substantial contribution 
arising from extremely high column densities. VP05 impose arbi- 
trary cut-offs at high TVhi to estimate such effects but the results 
are sensitive to the cut-off chosen. 

(vi) We have used a somewhat simplified obscuration model, ar- 
guing that fine details are absorbed into our parameter definitions 
and do not affect estimates for quantities of observational interest. 

There are two notable omissions in our modelling. Firstly, we 
assumed that the intrinsic cutoff TVcut was not dependent on metal- 
licity. But taking seriously the suggestion of |Schayel ( |2001p that the 
physical mechanism for preventing arbitrarily high TVhi absorbers 
is the conversion of Hi into H2, one would expect the character- 
istic transition column density to be linked to the presence of dust 
(an essential catalyst in the efficient production of H2). In fact, this 
would give a neat explanation for the coincidence of intrinsic and 
dust-induced cut-offs (by which we mean TVcut — t^^). But if 
this effect depends on metallicity, as is plausible, it should intro- 
duce intrinsic correlations in the (TVhi,^) plane; these would be in 
the same sense as dust obscuration effects. Since our likelihood is 



largely controlled by the comparison of radio and optical data (Sec- 
tion |3.3[ l which would not essentially be changed in such a scenario, 
it is likely that our analysis is robust. Nonetheless without a more 
specific physical model it is hard to assess this in more detail. 

Secondly, we have not included a model of gravitational lens- 
ing by the host halos of DLAs. There is some evidence in the SDSS 
sample of a correlation between TVhi and the background quasar 
luminosity which can be explained by this effect ( Mu rphy & Liske| 
|2004[|Prochaska et al.|2005t . Our simulations {Pontzen et al.|2008| > 
suggest that, in fact, the metallicity Z of a system is a better indica- 
tion of its mass than TVhi. Thus any lensing effect will presumably 
be correlated with metallicity. If so, the resulting entanglement may 
cause us to underestimate the dust bias - although if the processes 
genuinely compensate each other, the completeness limits are un- 
changed! (Gravitational lensing being monochromatic, this could 
only work in one waveband.) A full assessment of this possibility 
awaits future work. 

It seems likely that DLA dust biasing is a real but minor effect; 
all observational constraints are essentially consistent with this con- 
clusion. The fractional completeness of optically-determined val- 
ues for observable quantities depend on their weighting towards 
higher metal column densities. The least affected quantity is the 
overall incidence rate ?dla which is expected to be 93% complete; 
the most affected quantity is the mass of metals in DLAs f2z,DLA 
which is nonetheless expected to be underestimated only by a factor 
of about two. 
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APPENDIX A: CHOICE OF FITTING FUNCTIONS 

The choice of Schechter and lognormal distribution functions 
(equations [2] and [3j for our intrinsic column density and metallic- 
ity distributions respectively is somewhat arbitrary and we should 
ensure our choice is as fair as possible given our prior knowledge. 

The column density distribution choice is relatively easy to 
justify. We know that the obscured distribution as uncovered by the 
SDSS statistics is well fitted by a Schechter function ( [Prochaska] 
|et al.|20()F l, which consists of a power-law suppressed at high col- 
umn densities by an exponential decline. This decline can arise due 
to an intrinsic cut-off (TV^ut) or due to the exponential dust suppres- 
sion term in equation \10\ . We verified that our choice of column 
density distribution function does not bias our results by using only 
the SDSS likelihood (equation[22j to produce a posterior prediction 
for To which simply returned our prior. 

The form of the metallicity distribution presents more seri- 
ous difficulties. Because of the small-number statistics in the radio 
samples, we know most about the optically determined (obscured) 
distribution, which reads 

nz{Z)^ J dNnifziZ)fN{NHi)pdctcct[r{Nm,Z)] . (Al) 

It should be clear that the intrinsic distribution fz{Z) is only re- 
coverable from the data once we know the strength of the dust ab- 
sorption. Even then, with finite statistics one can never rule out the 
existence of a population of very high metallicity absorbers which 
are hidden from view. We therefore need to make an ad hoc param- 
eterization of fz which encapsulates our prejudice that (i) the dis- 
tribution function should change smoothly and (ii) the distribution 
function is likely to be unimodal. We will not discuss any models 
which fail to satisfy these conditions, but accept they could change 
results substantially. 

For our main results, we chose to use the lognormal distri- 
bution. [Vladilo|&P^ux]([2005]l contended that a Schechter func- 
tion provides a more generic fit, arguing that the shape of the high- 



and low-metallicity tails can be independently controlled. However, 
since both lognormal and Schechter fits have only two parame- 
ters, this claim should be interpreted cautiously. Given any two- 
parameter fit, once the mean and variance are specified the exact 
distribution function and hence its higher moments (such as the 
skewness) are fixed. Thus we should investigate which distribution 
function better encapsulates our knowledge of the systems; if nec- 
essary, extra parameters can then be introduced to compensate for 
deficiencies. The lognormal distribution is a fairly generic choice; 
further it is supported as a choice for fz by simulations JPontzen] 
|et al.|2 008) although it is hard to know how much weight to assign 
to such support. 

With our current data we find that the Schechter function 
provides a very poor representation of the obscured distribution 
nz{Z). Figure 



Al 



(left panel) shows the best fit lognormal and 
Schechter distributions; with flat priors on the expectation and vari- 
ance, the latter distribution is disfavoured in log evidence by more 
than 10, i.e. the probability of the data arising given the latter distri- 
bution is more than 20,000 times smaller. Employing this function 
for the intrinsic distribution fz (Z) is therefore likely to bias results 
against any scenario in which nz — fz, i-e. where dust obscuration 
is small. 

However, accepting that the lognormal distribution may be 
too restrictive a form for fz (even if it fits nz well) we inves- 
tigated the effect of generalising the metallicity distribution to a 
three-parameter family of distributions which allow for skewing the 
underlying fz- For this purpose, we have used a log skew-normal 
distribution. The skew-normal distribution (see | Azzalini|2005] and 
references therein) is written 



aZ; C, M, S) = {{Z - M)/S) * {({Z - M)/S) 



(A2) 



where ip and are respectively the probability density and cumu- 
lative probability of the normal distribution. It is remarkable, but 
simple to show, that this distribution is normalized for all values of 
C. For C = 0, the distribution is exactly normal; as — > +oo, — oo 
one obtains the half-normal distribution for Z > M and Z < M 
respectively. In between these extremes, C, smoothly interpolates 
between models of varying skewness. 

When C, is allowed to take any value it is possible to find mod- 
els with large tails of high metallicity DLAs in which dust obscura- 
tion makes the optical distribution compatible with the data. An ex- 
treme case is illustrated in the right panel of Figure [AT] the dashed 
line shows the intrinsic (strongly skewed) distribution while the 
solid line shows the observed (dust obscured, nearly symmetric) 
distribution. 

The radio sample is somewhat too small to fully rule out such 
cases, but we should impose a prior reflecting our knowledge of 
metallicities in the Universe. In particular, it would be extremely 
surprising to find a significant number of systems with Z > IQZq 
(see, e.g., [Thomas et al.|2005| in which the centres of z = Q early 
type galaxies are shown not to exceed even 3Zq). Therefore, in a 
test run of our markov chain, we allowed C, to vary with uniform 
prior but imposed a "brick wall": models predicting greater than 
one in 1000 intrinsic DLA systems of Z > IOZq were given zero 
prior probability. This is, of course, an arbitrary choice and will be 
model-dependent in its implications. But it is a simple first-order 
approximation, allowing a model with more complex behaviour 
while imposing our knowledge of direct observations of galaxies. 

Comparing this choice with our main (^ = 0) results, the 
differences in our posterior distribution were at the percent level 
and made no difference to our qualitative conclusions presented in 
the main paper. As the high-metallicity wall is relaxed, allowing 
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Figure Al. The points with eiTor bars show our optical sample of metallicities based on |Dessauges-Zavadsky et aTjjin prep^ and described in Section [Z2] (Note 
that the binning is for illustrative purposes only and is not pail of the analysis). In the left panel the solid and dashed lines show simple best fit lognormal and 
Schechter distributions respectively. The Schechter fit to the observed optical distribution is strongly disfavoured (see text for details) and therefore employing 
this function for the intrinsic distribution may artificially disfavour small bias scenarios. In the right panel, we illustrate a model in which the underlying 
metallicity distribution fz (shown by the dash line) is strongly skewed in log space, but dust absorption hides the long tail to high metallicities in optically 
selected surveys (solid line). In the illustrated model, the skewness parameter C, is 5.8 and the dust obscuration (ro = —22.0) hides the tail almost completely. 
This model should be discounted by a prior on allowed metallicities - even if the radio sample of DLAs is not strong enough to rule it out, the model includes 
significant numbers of DLAs with Z 2> ^Zq, greater than the values measured in even the most massive galaxies ^Thomas et al.|2005) . 



more Z > IQZq systems, the constraints are weakened; if one 
imposes no such prior, allowing systems of arbitrarily high metal- 
licity, Icr confidence intervals become 0.83 < F(iDLA) < 0.95, 
0.34 < F{{Z)) < 0.69 and 0.12 < F(nz,DLA) < 0.50. How- 
ever, we emphasize that much of the obscured cross-section is then 
in exceptionally high metallicity DLAs with Z > 10^0 - such a 
model seems very unlikely. 

In future, it will be possible to place tighter constraints 
on these model freedoms by obtaining expanded samples of 
DLAs from radio-selected QSO spectra. Although further blind 
radio surveys are relatively slow to reduce the variance of inci- 
dence rate statistics (fractional errors for A^radio DLAs scale as 
l/V-^radio), high resolutlou follow-up spectroscopy greatly in- 
creases the model-discerning power of the radio observations. Sim- 
ulations based on our peak posterior model showed that, with an in- 
crease in sample size to A'^radio — 35 (approximately twice the cur- 
rent number of CORALS DLAs with measured metallicities), mod- 
els with a high-metallicity tail could be independently rejected by 
the DLA sample. Conversely if a significant high-metallicity skew- 
normal tail exists but is hidden in optical samples, such a modestly 
expanded radio sample would be sufficient to reveal its existence. 
We therefore encourage observers to pursue further searches for 
DLAs in complete (i.e. fully optically identified) samples of radio- 
loud QSOs. 
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